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Abstract 

Much of uncertainty quantification to date has focused on determin- 
ing the effect of variables modeled probabilistically, and with a known 
distribution, on some physical or engineering system. We develop meth- 
ods to obtain information on the system when the distributions of some 
variables are known exactly, others are known only approximately, and 
perhaps others are not modeled as random variables at all. The main 
tool used is the duality between risk-sensitive integrals and relative en- 
tropy, and we obtain explicit bounds on standard performance measures 
(variances, exceedance probabilities) over families of distributions whose 
distance from a nominal distribution is measured by relative entropy. The 
evaluation of the risk-sensitive expectations is based on polynomial chaos 
expansions, which help keep the computational aspects tractable. 

1 Introduction 

Uncertainty quantification refers to a broad set of techniques for understanding 
the impact of uncertainties in complicated mechanical and physical systems. 
In this context "uncertainty" can take on many meanings, but we follow the 
convention of dividing uncertainty into aleatoric and epistemic categories. In 
short, aleatoric uncertainty refers to inherent uncertainty due to stochastic or 
probabilistic variability. This type of uncertainty is irreducible in that there 
will always be positive variance since the underlying variables are truly ran- 
dom. Epistemic uncertainty refers to limited knowledge we may have about 
the model or system. This type of uncertainty is reducible in that if we have 
more information, e.g., take more measurements, then this type of uncertainty 
can be reduced. However, for many problems where uncertainty quantification 
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is important, the acquisition of data is difficult or expensive. The epistemic 
uncertainty cannot be removed entirely, and so one needs modeling and compu- 
tational techniques which can also accommodate this form of uncertainty. 

Much of the work to date has focused on aleatoric uncertainty and what one 
might call the propagation of uncertainty. Here, one characterizes uncertainty 
concerning one or more elements of a physical system via some probability dis- 
tribution, and attempts to quantify how that uncertainty will be propagated 
throughout the system under its constitutive equations and laws. The sim- 
plest and most straightforward method to characterize this output distribution 
would be to sample from the input probability distribution, solve the system 
equations, and thereby produce output samples (i.e., standard Monte Carlo). 
Although this is very simple conceptually, it can be far from practical, espe- 
cially when it is computationally intensive to construct the mapping that takes 
input variables to output. Straightforward schemes such as Monte Carlo would 
require that the system be solved for each sample of the random variable, a 
situation that is often not practical. In recent years efficient computational 
methods have been developed to calculate particular functionals of the induced 
distribution that may be required for a particular application (e.g., covariances 
or error probabilities). Most recently, polynomial chaos has become popular as 
an efficient method for approximating the distribution of the output variable. 

We note that random variables with a well-defined and given distribution 
are often used in this context even when there is no justification for their use, as 
in the case of modeling error. In other words, it is (perhaps implicitly) assumed 
that epistemic uncertainty can be modeled by aleatoric uncertainty. One reason 
is that, at least as they have been developed to date, most computational tech- 
niques (e.g., polynomial chaos and Monte Carlo) are based on the assumption 
that the user can identify some "appropriate" distribution for each uncertain as- 
pect of the system, regardless of the type of uncertainty, aleatoric or epistemic. 
If one is interested in just basic qualitative properties of the system then this 
may not be a central issue, since virtually any model of uncertainty will give 
information on the sensitivities of the system. However, when the intended use 
of uncertainty quantification is for regulatory assessment or some other appli- 
cation where performance measures are sensitive to distributional assumptions, 
the issue becomes much more important, and one should carefully distinguish 
how one accounts for the two types of uncertainty. 

The aim of the present paper is to describe an approach that (i) logically dis- 
tinguishes those aspects of uncertainty that are treated as stochastic variability 
from other forms of uncertainty, (ii) in cases where a stochastic model is theoret- 
ically valid but for which determination of the distribution is not practical, gives 
bounds for performance measures that are valid for explicitly identified fami- 
lies of distributions, and (iii) is computationally feasible if ordinary uncertainty 
propagation is feasible. The intended audience is broad, including numerical an- 
alysts interested in robust performance bounds as well as applied probabilists 
for whom certain computational aspects of uncertainty quantification may be 
novel. Since a typical reader might not be familiar with the terminology and 
methods from both fields, we have included background material for both the 
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probabilistic and numerical analysis approaches used to make the paper broadly 
accessible. 

The paper is organized as follows. In Section 2 we further discuss the dis- 
tinction between aleatoric and epistemic uncertainty, and explain some of the 
limitations to achieving useful performance bounds in the presence of epistemic 
uncertainty. Section 3 introduces risk-sensitive performance measures and dis- 
cusses their robust properties. Several hybrid forms are introduced that are 
more useful than the simplest form when both aleatoric and epistemic uncer- 
tainties are present. Various properties of the risk-sensitive measures that are 
useful in applications (monotonicity, optimization) are also discussed. Section 4 
presents numerical examples, reviews the computational methods used to eval- 
uate the risk-sensitive performance measures, and finishes with a subsection of 
discussion and conclusions. 

2 Aleatoric and epistemic uncertainties 

As noted in the Introduction, uncertainty can be divided into two categories, 
aleatoric and epistemic. From the perspective of mathematical formulation and 
modeling, aleatoric uncertainty is in some sense simpler. In contrast, epistemic 
uncertainty can mean different things in different contexts. Lack of knowledge 
is an ambiguous term that encompasses many different scenarios. 

To illustrate, we consider an elementary example. Consider a system de- 
scribed by some well-posed partial differential equation (PDE), but with an 
uncertain boundary condition. The boundary condition is expressed in terms 
of a random variable X (e.g., Ux{0, t) = X) with a particular fixed distribution, 
and the numerical analysis problem is to characterize the induced distribution of 
the solution to the PDE at some time and location (e.g., u(xo, to; X)). Suppose 
we know that the boundary condition is properly modeled as a random variable, 
but we do not know the correct value of some parameter in that distribution. 
For example, based on a central limit type argument, one might claim the distri- 
bution is known to be Gaussian, but still unknown are the "true" mean and/or 
variance. In this case, the lack of knowledge is this missing information about 
the parameters of the distribution. The lack of information regarding these 
parameters is a form of epistemic uncertainty. Hence in this example aleatoric 
and epistemic uncertainty are mingled. Assuming that samples are available, 
one could use the empirical mean and variance from data to approximate the 
true mean and variance, and thus reduce this uncertainty. However, in the 
common situation where sampling is necessarily limited, some epistemic uncer- 
tainty is inherent in the model due to practical limitations on data acquisition 
and modeling. 

Another type of epistemic uncertainty is the omission of important aspects 
of the system model. This form is possibly the most difhcult to quantify. For 
example, there might be a hidden random variable in the model. In the PDE 
example we have assumed the boundary condition is modeled via a random 
variable, but there may be other coefficients in the model that are random but 
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treated as constants. It is also frequently true that the mathematical model 
used for the system is only approximately true. For example, the model might 
simplify the geometry of the true system, nonlinearities may be approximated 
by linear relations, etc. In the context of the PDE example, boundary conditions 
of Dirichlet form were chosen, when in fact a more realistic model might use a 
mixed form (e.g., Robin boundary conditions). 

For almost all forms of epistemic uncertainty there is little justification for 
the use of random variables to model the uncertainty. Nonetheless, it is common 
practice to do just that, and in practice randomness and random variables are 
used in many situations to account for errors in the physical model or model- 
ing ignorance. The reasons were mentioned previously-that basic distributional 
properties of the output (e.g., variance) will still provide a reasonable sensitivity 
analysis for the problem, and existing computational methods are largely based 
on aleatoric uncertainty. However, with little justification for the use of any 
particular distribution (or even the use of randomness at all), in more critical 
applications one may insist on rigorous bounds on performance that are valid 
for a particular family of distributions. In the extreme case where no proba- 
bilistic model is considered acceptable, one may wish to only prescribe bounds 
on certain parameters, and then obtain tight bounds on performance over all 
values of the parameters that satisfy the bounds. 

Thus there are many different types of uncertainty that one should account 
for in an analysis of the effects of uncertainty. These include: (i) aleatoric 
with known distribution; (ii) aleatoric with partly known distribution (mingled 
aleatoric and epistemic); (iii) epistemic for which one is willing to model by a 
family of aleatoric uncertainties, and (iv) epistemic where one is only willing to 
place bounds on the uncertainties. As remarked in the Introduction, this paper 
will introduce an approach that allows these uncertainties to be handled within 
a single framework that can exploit computational methods originally developed 
just for the treatment of aleatoric uncertainties. 

3 Duality for exponential integrals 

Our development of performance measures that distinguish forms of uncertainty, 
and which in particular allow for robustness with respect to epistemic uncertain- 
ties, depends on a duality relation between exponential integrals and relative 
entropy. Its first use along these lines but with regard to estimation appears 
to be in [T], and for optimization in [3]. We first state the basic duality re- 
sult, and then define two functionals which will allow aleatoric and epistemic 
uncertainties to be analyzed simultaneously but at the same time differentiated. 

The general duality is stated for random variables that take values in a Polish 
space (i.e., a complete, separable metric space) X. The associated cr-algebra is 
the Borel cr-algebra. A typical example of X for our purposes is a closed subset 
of some Euclidean space M''. Let V{X) denote the collection of probability 
measures on X, and let fi S V{X). Given any v S V^X) that is absolutely 
continuous with respect to /i, we define the relative entropy of v with respect 
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to by 



v{dx) 



whenever \og{dv / d^{x)) is integrable with respect to v. In all other cases 
R^u ll/i) is defined to be oo. 

Relative entropy defines a mapping {v,ijl) — R{v\\^) from 'P{XY to MU oo. 
This mapping has a number of very attractive properties. For example, relative 
entropy is non-negative, with i? (;/ ||/^) = if and only if = /i. In addition, the 
mapping is jointly convex and lower semicontinuous [21 Lemma 1.4.3]. A prop- 
erty of particular interest for our purposes is the following variational formula 
for exponential integrals, which can be considered as an infinite dimensional 
version of the Legendre transform. Given any bounded and continuous function 
F : X and any c G (0, oo). 



A, 



-log / e 



cF{x) 



= sup 

iyeV(,X) 



'-R(iy\\^i)+ / F(x)i^(dx) 
c Jx 



(1) 



For a proof see [21 Proposition 1.4.2]. The duality continues to hold as stated 
if F is bounded from above, and also when bounded from below if one restricts 
the supremum to z/ € 'P{X) for which R{v \\^) < oo [H Proposition 4.5.1]. 
As an elementary example on how such a formula can be useful, suppose that 
is a performance measure (e.g., variance or an error probability), and that 
/i is a model for some random phenomena. We consider /i to be the nominal 
model, e.g., our best guess. However, we are uncertain if this is the correct 
model, and would like a measure of performance that also holds for alternative 
models, but with a penalty for deviation from the nominal model. Ac is just 
such a performance measure, since by (|T]), for any alternative model v the bound 



F{x)v{dx) < Ac + -Riv\\l^) 



applies. Hence we obtain bounds on performance over a family of alternative 
models. The parameter c allows one to balance robustness with respect to 
possible model inaccuracies against tighter bounds. In particular, as c tends to 
0, Ac converges to J-^ F(x)fi{dx), which is the performance measure under the 
nominal model, but in the limit the bound is meaningful only when v = fi. 

Suppose that a bound on performance over a specific family of distributions 
is needed. Let R* denote the maximum of relative entropy with respect to the 
nominal model over this family. Then the tightest possible bound is obtained 
by minimizing 

Ac + -R* 

c 

over c > 0. We show in Proposition|3]below that this function has only one local 
minimum over c € (0,oo], and thus the global minimum is easy to compute. 

Functionals of the exponential form that appears in the definition of Ac are 
sometimes called risk-sensitive because the exponential amplifies the effect of 
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any large values of the performance measure F. In the next two sections we 
consider two different hybrid risk-sensitive functionals that will allow aleatoric 
and epistemic variables to be combined and yet differentiated. 

3.1 Two hybrid forms 

In this section and the next random variables with a known distribution will 
take values in a Polish space X. Variables whose distribution is not known or 
are otherwise of the epistemic variety take values in the space y. 

The performance measure of interest for some given problem is assumed to 
be of the form 

/ / F{x,y)^{dy)ti{dx), (2) 
Jx Jy 

where ^ (resp., 7) is a probability measure on X (resp., y). If X and Y are inde- 
pendent random variables with distributions /i and 7, then F{X, Y) represents 
both the performance measure (e.g., a second moment) as well as the underlying 
physical or mechanical system that maps these aleatoric and epistemic inputs 
into outputs. The analogous ordinary risk-sensitive performance measure is 

Ac=ilog/ f e-^^-'y^-f{dy)fi{dx). (3) 
c JxJy 

Neither of the measures ([2]) or ([3]) differentiate the variables according to type 
(aleatoric or epistemic) and given that the performance measure of interest is 
actually F, the use of a risk-sensitive version of the cost is not well motivated 
for the aleatoric variables. Indeed, use of this measure will give bounds that are 
robust with respect to variations on a distribution that is known, and obviously 
such bounds will not be as tight as possible. 

The first form we consider for a hybrid measure is 

Al=:-\ogf e^^'^^'^'^'y^f'^'^'^^-fidy). (4) 

C Jy 

Note that by Jensen's inequality (applied to the convex function cxp), this is 
smaller than (and in general strictly smaller than) Ac. Using the relative entropy 
representation for exponential integrals given in ([1]) [but with y in place of X and 
letting J-^ cF{x, y)fi{dx) be the cost function], it follows that for any distribution 
e{dy) 

[ [ F{x,yUdx)0{dy)<-R{e{dy)Mdy))+Al. (5) 
Jy Jx c 

This gives a bound on the performance measure for an arbitrary distribution on 
Y, but with the distribution on X equal to the known true distribution. The 
distributions thus play very different roles. In particular, we think of 7 as a 
nominal distribution of Y, which should be distinguished from a possible true 
distribution. The risk sensitive functional Aj, whose numerical evaluation can 
be carried out by a variety of methods (including polynomial chaos expansion 
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as discussed below), is based on the nominal distribution. Through the relative 
entropy duality, it will yield various bounds (depending on c) on a families of 
distributions, with the relative entropy distance the key metric. Ideally one 
would consider the smallest family that contains the "true" distribution (if it 
exists), and get the tightest bound by optimizing over c. With the hybrid 
risk-sensitive formulation, fi is allowed to be both the nominal (computational) 
distribution and also the true distribution. 

The second form one could consider for a hybrid measure is 



C 



X 



log / e^^(-^«)7(dy) 



y 



^{dx). (6) 



Note that by Jensen's inequality (applied now to the concave function log) , this 
is again in general strictly smaller than Ac. The relative entropy representation 
for exponential integrals gives 

F{x, y)e{dy\x) < -R {9{dy\x) Mdy) ) + -\og [ e^^ {dy) . 



Here 6{dy\x) is any stochastic kernel on y given A", which is essentially a con- 
ditional distribution on y given X = x (for the precise definition see [2J page 
35]). Integrating over X gives 



F{x,y)e{dy\x)^i{dx) <- I R (6 idy\x) \\-f{dy)) fiidx) + Al 
ixJy (^Jx 

In the case where 9{dy\x) is independent of x, we obtain 

/ / ^^(a:, y)9{dy)^iidx) < -R {e{dy) Mdy) ) + A^. 
JxJy c 

Note that the second hybrid cost gives a more general bound, in that it 
allows the alternative distribution on Y (i.e., 6{dy\x)) to depend on the value 
taken by X . This additional flexibility comes at a cost, and indeed we will show 
in the next subsection that the following inequalities hold (typically in a strict 
fashion) : 

/ / F(x, y)^{dy)^l{dx) < A] < < A,. (7) 
Jx Jy 

Hence if one is concerned with performance bounds for distributions on Y that 
do not depend on the variable X, then the tightest bound is obtained using Aj. 

Before studying properties of the risk-sensitive measures, we note some ele- 
mentary generalizations that are possible. Suppose that the nominal distribu- 
tion of X and Y is not of product form. In the setting of the first form, we 
could let 7 be the marginal distribution of Y and let fj,(dx\y) be the conditional 
distribution of X given Y ^ y. Using the extended definition 



Al = -\og [ e-/'^'=^(^'^)'^('^^l^)7(dy). 



y 
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we obtain the bound 




yJx 



F{x,y)ii{dx\y)e{dy) < -R{e{dy)\\^{dy)) + A]. 



For the second form, we let ^ be the marginal distribution of X and let ^{dy\x) 
be the conditional distribution of Y given X ~ x. With the definition 



It is indeed sometimes useful to allow the distribution of one type of variable 
to depend on the value taken by the other type of variable. As an elementary 
example, consider the situation where an aleatoric variable appears in a system, 
and while it is known that this variable has a Gaussian distribution with mean 
zero, the variance of the variable is however not known. Then the variance itself 
might be modeled as an uncertainty whose distribution is not known precisely, 
i.e., as an epistemic uncertainty. In this case K\ would be the relevant risk- 
sensitive formulation. An example of this sort is presented in Subsection 14.2.21 

Remark 1 In problems of regulatory assessment epistemic uncertainty can 
pose a unique set of challenges. The collection of data and model validation 
are often particularly difficult or expensive, while at the same time stringent 
bounds on performance might be demanded. In these circumstances, the frame- 
work presented here, wherein tight bounds are obtained for a known family of 
alternative models, would seem very natural. Prior to the collection of data or 
testing of samples to determine compliance, all critical elements of the model, 
including selection of the nominal model, size of the family of alternative mod- 
els allowed by the relative entropy bounds, and the performance measures that 
would be required to hold uniformly for this family, would all be determined 
by negotiation among the interested parties. This would allow various com- 
peting needs (e.g., expense of model validation versus adequate protection of 
consumers) to be balanced according to the interests of the participants. 

3.2 Properties of the risk-sensitive forms 

In this section we prove properties of the two hybrid risk-sensitive forms. In 
particular we derive an inequality relating the two, and study the limit as c — >■ oo. 
For the results to hold as stated we often need F be bounded from below. This 
is a mild assumption. Indeed, standard measures of performance such as second 
moments and error probabilities satisfy this condition. 

The first proposition shows the inequality between the two hybrid forms. 




we obtain 
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Proposition 2 Assume that F is bounded from below and consider the junc- 
tionals defined in (0), 0), 0j and (0). Then the inequalities 0) hold. 

Proof. The only inequality which does not follow from Jensen's inequality is 
Ac < A^. The relative entropy duality as stated in ([T]) applied to A J gives 



A^ = sup 

eev(y) 



F{x, y)fi{dx)e{dy) - -R {9{dy) \\j{dy) ] 



y Jx 



c 



where Viy) is the space of probability measures on y . The same representation 
applied to 



c 



gives 



- log / e-^^-^yhidy) = sup / y)e{dy) - -R {e{dy) ^dy) ) 
c Jy eev{y) Uy c 

The supremum operation in the last display can be done in such a way that 
an optimizing (or near optimizing) is a Borel measurable function of x, i.e., 
a stochastic kernel 0{dy\x) on y given X [5]. Let V{y\X) denote the space of 
such stochastic kernels. Integrating the last display with respect to ji gives 



F{x, y)e{dy\x)ii{dx) ~ - I R {e{dy\x) \\^{dy) ) ii{dx) 



Kl = sup 

eev{y\x) \_Jx Jy c Jx 

Since V{y) C V{y\X), A^ > follows. ■ 

The next proposition shows how to obtain tight bounds on the performance 
over a family of distributions defined in terms of a maximum relative entropy 
B. 

Proposition 3 Consider the junctionals defined in (0), ^ and (0), and let 
D = {c : Ac < oo} (resp., = {c:Ac<oo},i = l,2j. Assume that the 
interior of D (resp., D^) is nonempty. Then Ac (resp., A^.) is differentiable on 
the interior of D (resp., D^). Assume also that F is bounded from below by 
zero. Then c — > Ac (resp., c — > A^.) is nondecreasing for c > 0. Let B > be 
given. Then there is a unique c e (0, oo] at which 

c^-B + Ac (resp., -B + A] 
c \ c 

attains a local minimum, where the statement that the minimum occurs at c = oo 
means that Ac + B / c > Aqo for a well defined limit Arx, and all c < oo. 

Proof. To simplify we give the proof only for the case of Ac and omit the 
y variable. Thus we consider Ac = ^\og e'^^'^^^ ii{dx). Proofs for the other 
functionals are analogous. It is well known that H{c) = log e'^^'^^'' fi{dx) is 
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a convex function taking values in R U {oo}, and strictly convex and infinitely 
differentiable on the interior of {c : H{c) < 00} (see, e.g., [4]). 
Next we prove the monotonicity. Differentiating gives 



H{c) 



Since F > the derivative is non-negative, and convexity implies it is non- 
decreasing. Using 



1 



-H(c) 



1 



(i/(c)-if(0)) 



1 



H {s)ds, 



it follows that Ac = H{c)/c is nondecreasing for c > 0. 
First assume that M = supZ? = 00, and observe that 



dc 



1 



-B 



1 



H{c) 



1 



-B + cH (c) - H{c) 



Since H is strictly convex and H (0) > 0, the mapping /(c) — )■ cH (c) — H{c) 
is monotone increasing and takes the value at c = (see Figure 1). 




Figure 1: /(c) is monotone increasing 

Let K denote the limit of /(c) as c —> 00. li K ^ 00 then there is a unique 
solution to 

cH'{c) - H{c) = B, 

and we are done. The same is true ii < B < K. Hence the only case left is 
when B > K . In this case {B+H{c))/c is monotone decreasing for all c G [0, 00). 



10 



January 14, 2013 



bmce ff(c)/c > H {0)> 0, there is a well-defined limit Aqo that is necessarily 
the minimum. 

Next assume that M = supD e (0, oo). We claim that in this case cH (c) — 
H{c) t OD as c t M, and therefore we can argue just as before. By monotone 
convergence it must be that H{c) f oo as c t A/, which implies that necessarily 
H' {c) t oo as c t A^- To prove the claim, let < /3 < c < M. Then since H' (s) 
is increasing 



H{c) = / H {s)ds < I3H {13) + (c - I3)H (c). 

This implies 

cH'{c) - H{c) > P \h'{c) ~ h\i3) 

Letting M and using that > 0, the claim is proved, thus completing the 
proof of the theorem. ■ 

One of the situations of interest when aleatoric and epistemic variables ap- 
pear simultaneously is the case where all that is known regarding the epistemic 
variables are bounds. It turns out that this problem is well posed (i.e., the rel- 
evant performance criteria are finite) essentially in those cases where Aj^ < oo. 
In fact, when this is the case the value of K]^ can be used to establish optimal 
bounds subject to the constraint on the epistemic variables. For simplicity, we 
assume that the set A appearing in the statement of the following theorem is 
bounded and that 7, the nominal distribution for the epistemic variables, is the 
uniform distribution (in the limit c — > 00 the precise form of the nominal dis- 
tribution is not important, and in fact it is only the support of the distribution 
that matters in the limit). If A is not bounded one can use an appropriate dis- 
tribution whose support is all of A (e.g., the exponential distribution could be 
used for A = [0, 00). The choice of 7 as uniform when A is bounded is simplest 
and also one that is convenient for most uses. A related statement holds for 
A^, but it is not as useful (at least in this setting), since Aj!,^ < A^. 

Theorem 4 Suppose that A <zy is bounded and the closure of its interior and 
that 7 is the uniform distribution on A. Assume that F is lower semicontinuous 
in y for each x £ X and bounded from below. Define the risk-sensitive functional 
Al by ^ and let A}^ ~ linic-s-oo A;i . Then 



sup / F{x,y)fi{dx) = A^. 

y£AJx 

Proof. Since cA^ is convex the limit of Aj is well defined, though it might take 
the value 00. We give the proof for the case A;!,^ < 00. The extension to the 
case A;^ = 00 is straightforward. 

We first prove that the left side of the last display is bounded above by the 
right side. Fix any y £ A° , the interior of A. For small e > let be the 
ball in A about y of volume e, and let M be the volume of A. Let 9{dy) be 
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the measure whose density with respect to Lebesgue measure is 1/e on B^, and 
zero elsewhere. Then 



Rieidy)Mdy)) = [ \og[il/£)/il/M)]eidy) 

= (l/£)log[(l/£)/(l/A/)]. 

Then ([S]) gives 



F{x, y)eidyUdx) < ^ (1/e) log [(l/£)/(l/A/)] + A^. (8) 



'X JBe 

Since y F{x, y) is lower semicontinuous 



liminf / F{x,y)e{dy) > F{x,y). 



Letting first c — >■ oo and then e — in ([5]) , Fatou's lemma gives F{x, y)fi{dx) < 
A^ . Since ^ € is arbitrary, the lower semicontinuity and another use of Fatou 
give the bound for a\\ y £ A. 

For the reverse inequality, we use the fact that the minimizing measure in 
the definition of A^. exists. In fact, 

/ / Fix,y)^i{dx)e:{dy)^-R{e*{dy)Mdy))+Al 
JyJx c 

precisely at 6* defined by 

dl{-) I Ja 

(see [21 Proposition 1.4.2]). By the non-negativity of relative entropy 
A;^ = lim Al 

< lim sup / / F{x,y)ii{dx)ei{dy) 



< sup / F {x , y) iJ.{dx) . 

yeAJx 



4 Examples and computational methods 

To indicate how the robust performance bounds might be used in practice, we 
present some numerical examples, which illustrate the relationship between the 
different risk sensitive integrals and the relevant techniques and computational 
issues. We first review the polynomial chaos techniques that are used to compute 
the risk-sensitive integrals. The reader familiar with this material can skip to 
the next section. 
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Remark 5 Although previously the random variables with known and un- 
known distributions were denoted by X and Y, in the rest of the paper these 
random variables will be denoted by Zi and Z2. This is done to free up x and 
y to be spatial variables in various PDE and related equations that might be 
used to define the mapping F. 

4.1 Review of polynomial chaos methods 

This subsection reviews the polynomial chaos method that we use to compute 
the risk-sensitive integrals. The reader familiar with this material can skip to 
Section 4.2. 

To calculate risk sensitive integrals, one could use Monte Carlo integration, 
which requires evaluating F{Zi, Z2) for many replicas of (^1,^2). This can get 
costly (though not necessarily for the given examples), and so we utilize gener- 
alized polynomial chaos (gPC) methods to approximate F{zi,Z2) and evaluate 
various statistical properties of F{Zi,Z2) (see, for example, |^). We briefly 
recall the mathematical framework of polynomial chaos methods, along with a 
simple example in one dimension. 

Let (f2, ^ , 7^) be a probability space where il is the sample space, A the asso- 
ciated (T-algebra, and V the probability measure on A. Define a random vector 
Z(aj) = {Zi{u!), . . . , Zn{uj)) e on this probability space. Consider a d- 
dimensional bounded domain D C M*^ with boundary dD and let < £ [0, T], T € 
(0,00). We consider random fields u(t,x;Z(a;)) : D x [0,r] x f2 — >■ M that are 
defined by requiring that for a.e. cj € 

£(i, X, u; Z(w)) = f{t, x; Z(w)), {x, t,uj) e D x [0, T] x n, 

subject to the boundary and initial conditions 

B{t,x,u]Z{uj)) = g{t,x;Z{uj)), {x,t,uj) £ dD x [0,T] x n, 

u{t,x\7i{uj)) = uo{x]Z{uj)), {x,t,uj) £ D X {Q} X n. 

Here x = (cci, Xd) G K'*, £ is a linear or nonlinear differential operator, and B 
is a boundary operator. We assume for simplicity that a unique classical sense 
solution exists to the differential equation and/or boundary conditions. Note 
that for the first two examples of the last subsection the problem involves only 
time dependence, and hence there is neither a spatial variable nor a boundary 
condition. 

We assume that {Zi}fL^ are independent random variables, with support 
{rj}^-^ and with probability density functions {pi(.Zi)}^i, respectively. Hence 
the joint density is p{z) =IlfLiPi{zi). Let fl be the canonical space x^^Ti, 
in which case Z(u;) = uj and we identify uj with z. Thus the equations above 
become 

C{t,x,u;z) ^ f{t,x;z), {x,t,z) e D x [0,T] x ft, (9) 
subject to the boundary and initial conditions 

B{t,x,u;z) = g{t,x;z), {x,t,z) e dD x [0,T] x n, (10) 
u{t,x;z) = uo{x;z), {x,t,z) e D x {0} x ^l. 



13 



January 14, 2013 



We consider approximating the mapping as a function of the stochastic variable, 
i.e., z — >■ u(t, X] z) at some particular [t, x), via a finite sum of orthogonal basis 
functions. 

We first define finite dimensional subspaces for L'^i^i) according to 

Wf^ = {^v:T,^R:v e span (2,)},^^=o} ' * = 1' ^• 

Here Pi represents the highest degree of the polynomial basis function, and 
{(f'i.mizi)} are a set of orthonormal polynomials with respect to the weight pi, 
i.e., for ni ^ n 

(t)i,miZi)(f>i,n{zi)piiZi)dz^ 

and 

4'ln{Z'i)Pi{zi)dZi = 1. 

The orthonormal basis representation is determined by the probability density 
function p,. For example, if the density is uniform or Gaussian, then Legendre or 
Hermite orthogonal polynomials, respectively, are used. A table of polynomial 
basis functions and their respective distributions is listed at the end of this 
section (also see [B]). A finite dimensional subspace for L^(r), where F = 
Fi X • • • X Fat = n, can either be defined as 

W^= (g) Wf' 

\p\<p 

where the tensor product is over all combinations of the multi-index P = 
(Pi, . . . ,Pjv) e with |P| = P, < P, or 

N 
i=l 

Thus, is the space of A^-variate orthogonal polynomials of total degree at 
most P, whereas is the full tensor product of the one-dimensional poly- 
nomial spaces with each highest degree P. Note that dim{W^) = (^+^) and 
dim(VFj^) = (P + 1)", and that for large N, (^+^) < (P + 1)^. Since our 
examples only consider A^ = 2, we will use the full tensor product space, W^. 
Let $j(z), j = 1, . . . , A/ denote the elements of W^, where M = dim(W^^). 

There are two standard methods for constructing gPC approximations, re- 
ferred to as the stochastic Galerkin method (see, e.g., [6]) and the stochastic col- 
location method (see, e.g., [7j). Since the calculations presented below use only 
the collocation method, we restrict discussion to this method. With stochastic 
collocation, we first consider an approximation to the solution u{t, x; z) in terms 
of a Lagrange interpolant of the form 

v{t,x;z) =^v{t,x;Zk)'Lk{z), (11) 
fe=i 
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where Lfe(z) is a Lagrange polynomial of degree diin(Wj^), satisfies Lfc(z;) = Ski, 
and {zk}kZi ^re a set of prescribed nodes in tlie A^-dimensional space F. We 
require that the residuals 

R{t,x;z) = C{t,x,v;z) — f{t,x;z) 
R^'"{t,x;z) = B{t,x,v;z) — g{t,x;z) 
R'^{x;z) = v{0,x]z) ~ uo{x;z) 

vanish at the collocation points {zfe}^]^, i.e., 



C(t, X, 


v{t,x;zk] 


);zfc) - 


- f{t,x;zk) 


= 0, for k = 


1,., 


..,7Vp 


B{t,x, 


v{t, X] Zk 


);zfc) - 


- 9{.t,x;zk) 


^ 0, for k = 


1,., 


..,7Vp 




«(0, 




- Uo{x\Zk) 


= 0, for k = 


1,., 





Thus, the stochastic collocation method produces a set of Np decoupled equa- 
tions, where each value of v[t, x; Zk) coincides with the exact solution u{t, x; z) 
for the given z^, since both satisfy the same equations. 

Once the values of {v{t, x;Zk)}^li have been determined at the collocation 
points, one can construct a gPC approximation in the orthogonal basis repre- 
sentation in the form 

M 

v{t,x-z) = ^f)j(f,x)$j(z), (12) 

which, under certain conditions, will be equivalent to the Lagrange basis formu- 
lation. The coefficients {ujl^^^j^ can be determined by inverting a Vandermonde 
matrix, where invertibility is dependent on the choice of collocation points. 
If one chooses quadrature or cubatur^ll points for the collocation points, then 
invertibility is guaranteed. Furthermore, if the cubature rule is exact up to poly- 
nomials of degree 2Af , inversion of the Vandermonde matrix is not necessarjo, 
and the coefficients {wjl^^j^ can be computed by evaluating 

Vj{t,x) = ^v{t,x;zk)^j{zk)wk, (13) 



fc=i 



where {wk}^Zi respective weights according to the choice of quadrature 

or cubature points. 

Statistical information is easy to obtain from the form (jl2[) . For example. 



M 

E4v{t,x;z)] = I \ ^Vjit,x)(S>j{z) | p{z)dz ^vi{t,x) 



^cubature points just refers to quadratures in more than one dimension. 
^This is especially useful if the Vandermonde matrix is ill-conditioned. 
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and 




as a consequence of the orthonormality. Other statistical information, such as 
higher moments and sensitivity coefhcients, are also easily calculated (see [6]). 
The ease with which one can calculate such moments explains why orthogonal 
representation (|12p is preferred over the Lagrange form ([Tl]) . 

To illustrate the approach, we look at a simplified version of Example [T] Let 
9(^2) = wq, so that the only uncertainty is in the decay rate fc, which depends 
on the random variable Z. The simplified problem becomes 

^u{t) = -k{z)u{t), u{0) = Uq, 
at 

where u{t; z) denotes the solution parameterized by z, with z a point in the range 
of Z . Requiring that the residuals vanish at the collocation points {zn}^Zi gives 

R{t; Zn) = ^^(i; Zn) + k{Zn)v{t; Zn) = 

and 

i?^^(z„) =w(0;z„)-uo =0. 

Note that there are indeed Np decoupled equations, which are solved indepen- 
dently for each n. Here the solutions are exact: 

w(t;z„) =uoe^'=(""'*, for = l,...,Np. 

One can then obtain the approximation (|12|) with respect to the orthogonal 
basis, assuming that the appropriate collocation points and weights have been 
chosen. 

We illustrate this example with k{z) = z and Z ~ A/'(/i = 0,ct^ = 1). Here 
the {^'j}*^]^ are the Hermite polynomials, which are orthonormal with respect to 

the weighting function p{z) — e^'-^^^^"/^'^^/cr-\/27r with support F = (—00, +00). 
Figure [2] shows the stochastic collocation approximation v{\,z) at t = 1 as a 
function of z, with Np = 2,3,4,5, versus the exact solution Uexact{^, z) = e~^. 
Note that the collocation approximation is an interpolation of the exact solution 
at Np points. 

Figure [3] shows the relative error between the exact and approximated mean 
and variance in a semilog plot. The linear decrease in relative error on the 
semilog plot implies an exponential decay in the error. Errors become a constant 
at the limits of the accuracy of the numerical scheme. 
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Figure 2: Interpolated solutions versus the exact solution at t = 1. 
4.2 Numerical examples and interpretation 

We now discuss numerical calculation of the risk-sensitive integrals. For conve- 
nience, we recall the three types of risk-sensitive integrals introduced in Section 
13.11 the ordinary risk sensitive cost Ac and the two hybrid integrals and A^: 



Ac = 


ilog 




A^ = 


ilog 




-/ 





^{dz2)fJ-{dzi), 



(14) 



Zi JZ2 



1-1 / cF(zi,Z2)ti(dzi) I J N 

-log / e-*^! 'j(dz2), 



Z2 

log 



^cF{zi,Z2) 



Z2 



l(dz2) 



^(dzi). 



where Zi , Z2 are the range spaces for the random variables Zi , Z2 , respectively. 
Recall the relationships A;!; < A^ < Ac. It is assumed that the distribution of 
Zi is known, but this is not true for Z2. 

In the stochastic collocation approach, we replace F{zi^Z2) in (fH|) with a 
polynomial interpolant. F(zi,Z2), determined by the full tensor product of the 
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Figure 3: Relative errors. 



one dimensional collocation points, {z2,i}^^i in Zi x Z2. Thus 

M 

J=l 

where M = Ni^q ■ N2^q , $j is the full tensor product basis, and Fj{t,x) is 
computed using (jl3p . From here, one can efficiently compute the risk sensitive 
integrals ((TJ]) via either Monte Carlo sampling or quadrature. For both meth- 
ods, once the samples or quadrature points have been chosen, one can calculate 
the risk sensitive integrals for different values of c without resampling or choos- 
ing different quadrature points. Hence, it is computationally inexpensive to 
compute the risk sensitive integral for different values of c. Note that for higher 
dimensions, a sparse grid is preferred and, in most cases, necessary for these 
types of computations. It is typical to use the Smolyak algorithm to generate 
these sparse grids, which are based on a one dimensional quadrature rule (for 
example, see Section 4.1.2 in [5] for a more detailed explanation and further 
references). 

In this paper, we opt for the numerical quadrature approach. Since F(zi, Z2) 
can be evaluated very quickly, we can compute the risk sensitive integrals with 
high accuracy and efficiency using a high order quadrature rule. The number 
of quadrature points chosen to compute the risk sensitive integrals need not 
be the same as the number of collocation points used in computing the coeffi- 
cients of the gPC expansion, though the type of quadrature points is the same. 
For example, if we were to use Legendre-Gauss quadrature points to evaluate 
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the gPC coefficients, we would again use Legendre-Gauss quadrature points to 
calculate the risk sensitive integrals. In the examples presented below a much 
larger number of quadrature points are used to compute the risk sensitive inte- 
grals. To avoid confusion, to the set of quadrature points used to evaluate the 

risk sensitive integrals is denoted by ® , i-e., the full tensor 

product of one dimensional quadrature points. In the examples, Ni^q and N2^q 
equal 2^, while Ni,q and N2,q are 8 for smooth F and 12 when F is an indicator 
functions. 

We also implemented the Monte Carlo approach, but observed that numer- 
ical quadrature gave a much better approximation for the same computational 
effort, at least for our examples. It should also be noted that a variation on 
traditional Monte Carlo is needed for the hybrid forms of the risk sensitive in- 
tegrals. For the original risk-sensitive cost function the standard approach 
is to use many independent samples of the pair {Zi,Z2). This produces an 
estimate which has a mean square error of 0(7V-i/2), where TV is the total 
number of Monte Carlo samples. However, for the first hybrid form one 
must approximate the inner integral J-^ cF{zi, Z2)^i{dzi) for every fixed sample 
of Z2. Thus one has to simulate more Zi samples than Z2 samples. Similarly, 
to compute the second form of the hybrid one must approximate the inner 
integral Jy e'^^^^^'^^'^j{dz2) for every fixed Zi. In particular, for both hybrid 
forms one does not simply choose independent samples of (Zi, Z2). 

4.2.1 Independent aleatoric and epistemic uncertainties 

When the aleatoric and nominal epistemic variables are independent, the nu- 
merical quadrature approach would evaluate the risk sensitive integrals via the 
formulas 



Ac 



AJ 



where {^1,^, Wi^ij^L'^^' , {^2,17 w'2,i}^^{' E^re the pairs of quadrature points and 
weights chosen based on the underlying distribution. 

The first example is taken from [5] and involves a simple ODE. The example 
is convenient because analytic solutions to the risk-sensitive integrals can be 
obtained, and then compared with the numerical approximations described in 
the next section. The second example is a random oscillator, where the stochas- 



1=1 3=1 

^ log I ^ e 



Ni., / W2., 
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tic parameters are the spring and dampening coefficients. Tlie third example 
involves a heat equation with random heat capacity and thermal conductivity. 

Example 1 We consider F{Zi, Z2) defined in terms of the solution of the ODE 
ju{t) = ~k{z{)u{t), m(0)=5(z2), 

where k and g are known functions. Letting u{t; zi^ Z2) denote the solution 
parameterized by (zi, Z2), set F{Zi, Z2) = h{u{t; Zi, ^2)), where typical choices 
of h are h{u) ~ u, h{u) ~ u^, h{u) = l{a < u <K\. 

Here we take k{zi) = 2:1,(7(22) — 22, let Zi ^ J7[0,l], Z2 ^ ?7[0,1], and 
compute risk sensitive integrals for f (21,22) = [u(l; zi, 22)]^, and F{zi,Z2) = 
1 {.8 < u(l; 2i, 22) < !}• Hence the goal is to obtain robust bounds on the sec- 
ond moment and the probability that the solution to the stochastic ODE falls 
within the interval [.8, 1] at time t = 1, respectively. Note that the indicator 
function is not smooth, and thus more collocation points are required to obtain 
an accurate approximation. We choose a Legendre polynomial basis for both 
Zi and Z2 and use the Legendre-Gauss quadrature points and weights for each 
dimension. 

Figiues d] and [5] show the approximations as a function of c for F{u) = v? 
and F{u) = 1{.8 < m < 1}, respectively. As expected A;!: gives the best bounds, 
while the original form Ac gives the worst upper bounds of the three. 




0.1 1 ' ' ' ' ' 1 

10 20 30 40 50 60 

c 



Figure 4: Example [U {A*}.^^ for F{u) u^. 

The plots also suggest what happens as c — > 00. The measure Ac is expected 
to yield robust bounds if one is uncertain regarding the distributions of both 
random variables Zi and Z2, and limc^.oo Ac represents the tightest upper bound 
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c 



Figure 5: Example^ A^, {KVi=i ^i^) = H-^ < u < !}• 

on performance if we know nothing about these random variables except for their 
support. In this case, the hmit in Figure [S] appears to be 1, which means that for 
some choice of zi and Z2, m(1; zi, Z2) G [.8, 1]. Hence without more information 
on the distributions of Zi , Z2 we do not obtain information other than this 
from this performance bound. On the other hand, hmc_i.oo -'^c for i = 1,2 is 
strictly less than 1 and thus by Theorem 2] gives a meaningful performance 
bound when the only information available about Z2 is its support. Note also 
that while and A^ seem close for F{u) = u^, the differ considerably for 
F{u) = 1{.8 < u < 1}. 

In Figures |6] and [7] we display the relative error |A* — A^ g^^^J/ |A* g2,jjj,(| 
between the three risk sensitive integrals and the exact solution, which we were 
able to calculate using partially analytical solutions. The relative error is fairly 
small for F{u) = u^, but significantly larger for F{u) = 1{.8 < u < 1}, even 
though more collocation points were used to determine the gPC coefficients. 

Next we consider the problem of computing the value of c which minimizes 
c — >■ is + A* for i = 0,1, 20 Here is a maximum relative entropy distance 
that will be allowed between the "true" distribution, 9{dz2), and the nominal dis- 
tribution, 7(^22)1 a-nd the minimization produces the tightest possible bounds. 
For the example, suppose that the family of distributions for which bounds are 
to be valid includes 6{dz2) ^ beta(a = 1.5,/? = 1.5), which implies B « .0484. 
In this example B < f{c) = Cj^H^{c) — H{c) for some c, where H^{c) = cA^. 
As shown in Proposition [31 this implies there is a unique local minimum for 
c G (0,00). See Figures [5] and ini Alternatively, if i? > /(c) for all c, then 
Proposition [3] implies the minimum is achieved in the limit as c 00. From 

^Note that the risk sensitive integrals are infinite for c = 0. Thus in the numerical calcu- 
lations, we actually compute c starting from .01. 
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c 



Figure 6: Example [1] relative error for Ac, {A^}^^^ with F{u) = u^. 
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Figure 7: Examplefl] relative error for Ac, {A* } .^^ with F{u) = 1{.8 < u < 1}. 

Figure [HI we find a minimum for + Aj of approximately 0.04 at c « 5.12. 
Thus for all distributions on Z2 whose relative entropy distance to U[0, 1] is less 
than 0.0484, the probability that the solution to the random ODE falls between 
.8 and 1 at time 1 is less than 0.04. 

Note that the minimum is at different values of c for the three different 
integrals, but the minimum is always the smallest for the first form of the 
hybrid. In the case when the minimum occurs at some c < 00, the minimum 
is easily calculated, since this is a one-dimensional unconstrained minimization 



22 



January 14, 2013 



0.65 r— 

K 

0.6 - 

0.55 -I 
I 

0.5 

~t OA - ' 
m ' 
0.35 - 1 

0.3 - 

0.25 - 

0.2 - ' 




■¥ 



-K 



* * 



10 12 14 16 18 20 
c 



Figures: Example [H i(B + Kc),{\{B + A*^)}^^^ with = 
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Figure9: ExamplelH i(B + Ac), {^(B + A^)}^^^ with = 1{.8 < u < 1}. 



problem (for example, one can use a golden method search algorithm, such as 
MATLAB's fminbnd function). In the case when the minimum occurs in the 
limit as c — >■ oo, in order to find the minimum, one can iterate A* until the 
successive iterations differ by less than a prescribed tolerance. 

The second example is a random oscillator, where the stochastic parameters 
are the spring and dampening coefficients. 
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Example 2 We consider F{Zi, Z2) defined in terms of the solution of tlie ODE 
d 

—u{t)+-f{z2)-u{t) + k{zi)u{t) = fit), u{Q)^uo, m'(0)=<, 

where k and 7 are known functions, which represent the spring and dampening 
coefficients, respectively. The outcome of interest is whether or not the position 
of the random oscillator falls within a specified range, [a, &], at a specified time, 
tcriticai- Letting u{t] zi, Z2) denote the solution parameterized by (zi,Z2), set 
F(Zi,Z2) = l{a<u{t 

critical 

Here we consider the random oscillator with fc(zi) = 421,7(22) = -^2/5, and 
f{t) = lOcos(lOt) + 3, and choose Zi ^ beta(a = 5,/3 = 5), Z2 ~ beta(Q; = 
5,/3 = 5), tcriticai = 4, and [a,b] = (—00, 2]. Here we use the Gauss- Jacobi 
quadrature points and weights. In this example we are concerned with the 
position of the oscillator at a critical time. Figure [10] plots the risk sensitive 
integrals for Example[2]with F(u) = l{u < 2}. FigurefTTldepicts c — )• -{B+Al), 
where B = R{0{dz2) \hidz2)) « .0587, with e{dz2) ~ beta(a = 10, /3= 10). 
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Figure 10: Example[2l A^, {KV,=i for -P'C") ^ li"" < 2}- 

Finally, the third example involves a heat equation with random heat capac- 
ity and thermal conductivity. 

Example 3 We consider F{Zi, Z2) defined in terms of the solution of the PDE 

d , , fc(w;zi) /n \ / \ 

—u{t,x) = ———u{t,x), u(0,x) = uo{x) 

dt 'm[Z2) dx'^ 

on x e (0, L) with boundary conditions 

-k{u]Zi)-^u{t,0) = q, -^u{t,L)^0. 
dx dx 
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Figure 11: ExampleO i(B + AJ, + A*)}^^^ with F{u) = l{u < 2}. 



Here /c and m are the thermal conductivity and heat capacity, respectively. Note 
that in this example, the randomness affects the diffusivity and the boundary 
conditions. We are interested in the probability that the material exceeds a 
particular temperature. Tcriucah at the time t final and at some point x* € 
[0,L]. Letting u{t; zi, Z2) denote the solution parameterized by (21,22), set 

inah-^ ] ^2) ^ ^critical}- 

Here we consider the random heat equation with (nonlinear) Neumann bound- 
ary conditions. We use k{u, zi) = 21 + (1.5 x 10^^)m, m(z2) = (10^®)z2, q = -35, 
L = 1.90, uo{x) = 25, and Tc„ucai = 980, x* = 0, tf^nai = 1000, and set 
F{Zi,Z2) = l{u{tj:inai,x*; Zi, Z2) > Tcritical}- For the random variables, we 
introduce two independent random variables Zi ^ beta(a = 5,/3 = 5), Z2 ~ 
beta(a = 5,/3 = 5) and let Zi = (2 x lO"^)^! + 3 x 10"^, Z2 = (0.11)^2 + 0.30. 
Figures [T^] and [Ij] plot the risk sensitive integrals as a function of c and illus- 
trate c -> ^{B + Ai), where B ^ R{e{dz2) \\l{dz2)) ~ .0587. and with e{dz2) ~ 
beta(Q; = 10, /? = 10). Again, we use the Gauss- Jacobi quadrature points and 
weights. 

4.2.2 Dependent aleatoric and epistemic uncertainties 

In this section we consider problems where the distribution of Zi can depend 
on the value of Z2, or vice versa. In the case when Zi depends on Z2, we use 
the following inequality, which holds for the alternative of the first hybrid form 
(see the discussion in Section [5TT|) : 

/ / F{zi,Z2)ll{dz^\z2)e{dz2)<-R{e{dz2)h{dZ2))+Kl, 

J JZi c 
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Figure 12: Example [3l risk sensitive integrals as a function of c. 
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Figure 13: Example^ plot of ^(B + A^), {^{B + Al)}l 



where 

Al^^logJ exp cF{zi,Z2)n{dzi\z2)J -f{dz2). 

Recall that the relative entropy term {6{dz2) || 7(^22) ) represents our lack of 
knowledge about the distribution of Z2, i.e., the epistemic uncertainty where we 
want robustness, whereas Zi represents aleatoric uncertainty. A question that 
now arises is whether or not it is still possible to use gPC methods to evaluate 
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this integral. Consider the example with ~ C^[0, 1] and Z\ ~ N{Z2,1), so 
that Zi is Gaussian with variance 1 and mean Z2. Certainly Zi and Z2 are not 
independent random variables. However, we can introduce an auxiliary normal 
random variable Z ^ A/'(0,1), and write Zi = Z + Z2, where Z2 and Z are 
independent random variables. Now we consider G{Z, Z2) instead of F{Zi, Z2), 
and note that G{Z,Z2) = F{Z + Z2, Z2). Furthermore, A J can be written as 



Al 



1 



■log 



exp 



cG{z,Z2)n{dz) -f{dz2) 



Hence, A;!; can be calculated just like Aj via a gPC approximation. Similar 
results hold for the case when Z2 depends on Zi , but the epistemic uncertainty 
is in Z2- 

In general, as long as we can find a smooth, invertible mapping from {Zi, Z2) 
to an independent pair of random variables, one can evaluate the integrals as 
before. Figures [14] and [15] show the performance of risk sensitive integrals for 
dependent random variables as above for Example [T] and with F(u) — l{l/2 < 
u < 1}. 
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Figure 14: Example[Tl A^ for F{u) = l{l/2 < m < 1}. 



4.3 Discussion and Conclusion 



In a recent paper by Xiu et. al. [8], an approach to dealing with epistemic 
uncertainty via polynomial chaos methods was developed. In [5], epistemic 
uncertainty meant that the true distribution of the underlying stochastic pa- 
rameters was not known precisely. Here we compare our method of handling 
this type of epistemic uncertainty to theirs. 
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Figure 15: Example [Tli(J5 + Aj) F{u) = l{l/2 <u< 1}, and 9 - 7V(.8, 1). 



The approach taken in [S] is based on stochastic collocation approximation. 
Specifically, the differential equation of interest was solved on a generic collo- 
cation grid on the sample space of the random variables. This is in contrast to 
standard uses of stochastic collocation for uncertainty quantification, where the 
collocation grid is chosen in accordance with the (assumed known) underlying 
distribution. Of course the reason for the particular choice of basis polynomials 
and quadrature points is to obtain optimal (spectral) convergence with respect 
to the weighted error. In the approach of [S], regardless of what is the "true" 
(and assumed unknown) distribution, one chooses grid points with respect to 
the weighted norm with constant weight (e.g., Legendre- Gauss quadrature 
weights), even if the original space is unbounded. One then solves the dif- 
ferential equation at these collocation points and constructs the interpolating 
polynomial of the approximation. Hence, the gPC approximation converges 
optimally in the norm only if the underlying distribution is uniform. For 
non-uniform random variables the approximation will still converge point-wise 
as the number of collocation points increases (see Section 5 in [5]). 

Once the interpolated approximation is obtained, one can determine the 
mean, variance, probabilities, and so on with respect to various underlying dis- 
tributions on the parameters via Monte Carlo or other methods. The benefits 
are still large, in that one need not evaluate the differential equation for every 
Monte Carlo sample, but instead evaluates a much simpler approximation via a 
finite sum of polynomial basis functions. However, information can be obtained 
only by fixing a choice for the underlying distribution. Also, depending on 
which underlying distribution is used, the errors in the function approximation 
will have a greater or smaller effect. 

In this paper we have developed an approach in which performance bounds 
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can be calculated for the mean, variance, probabilities, and so on with respect to 
all distributions that are within a particular relative entropy distance from the 
nominal distribution, and the bounds are optimal for that class of distributions. 
In addition to providing performance bounds on a class of distributions defined 
by their relative entropy distance, the limit c — > oo is particularly interesting in 
that it gives tight bounds on epistemic uncertainties where one assumes nothing 
about the underlying distribution of certain variables except their support. In 
addition, the method one can efficiently handle both aleatoric and epistemic 
uncertainty simultaneously. More precisely, by computing certain hybrid forms 
of a risk-sensitive expectation we obtain tight performance bounds that distin- 
guish between variables with known distribution and variables with unknown 
distribution or other forms of epistemic uncertainty. In particular, the scenarios 
to which the method applies include (i) aleatoric with known distribution; (ii) 
aleatoric with partly known distribution (mingled aleatoric and epistemic); (iii) 
epistemic for which one is willing to model by a family of aleatoric uncertainties, 
and (iv) epistemic where one is only willing to place bounds on the uncertainties. 
For all the examples we have considered (including those presented in the pa- 
per) , approximation of the required risk-sensitive integrals can be done via gPC 
methods using roughly the same computational effort as would be required to 
compute the corresponding ordinary performance measures using the nominal 
probability distributions. 

A Distribution for Polynomial Basis 

Here we list some of the most common distributions, both continuous and dis- 
crete, and their corresponding gPC basis representations (see [5]). 



B Relative Entropy Formulas 

In this section we evaluate relative entropies for some of the most common types 
of distributions listed in Table [T51 Since most of the distributions listed in this 



Distribution 

Gaussian 
Gamma 

Beta 
Uniform 
Poisson 
Binomial 
Negative Binomial 
Hypergeometric 



gPC polynomial basis 



Krawtchouk 
Meixner 
Hahn 



Hermit e 
Laguerre 

Jacobi 
Legendre 
Charlier 



(15) 
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table fall within the exponential family, we start with the general form 

=/i(x)exp |^^7?,(0)T,(a;)-A(r,i(0),...,r,,(0))^ , (16) 

where Ti{x), h{x), rji{6), and A{6) are known fmictions, 6 = {6i, 9d)^ is the 
parameter of the family. 

Example 4 For Gaussian distributions 9 — (/i, cr)-^, where /i is the mean and 
a is the variance. This is put in the general form by setting 

= {f^,crf, ^"(^'^) ' " '^(^) = (^'^^)^' 

1 



A(.(e)) = ^+lnH = -4 + iln 
producing 

We now state the relative entropy formula for distributions which have a 
probability density function given by the form of exponential family in I\IG^ . 
The relative entropy between a probability measure P with density p{x) and 
another probability measure Q with density q{x) is given by 

R{P\\Q)^ f p{x)\nPP-dx (17) 

whenever P is absolutely continuous with respect to Q, where T is the support 
of p and q. In all other cases it is defined to be oo. 

A simplified version of the relative entropy formula for distributions from 
the same exponential family type is available. Suppose that q(x) and p{x) have 
the same h, r]i^Ti, A, but with different parameters 9i and 62 (note that 61 and 
02 are vectors). If we associate q{x) with di and p{x) with 02, then it can be 
shown that the relative entropy formula reduces to 

R{P\\Q)=i2(m{02)-V^iOl)) f T,{x)pix)dx - A{rjie2)) + Airj{9,)). (18) 
i=i -^r 



Gaussian. If Q = N{iii,ai) and P = N{^2,<^2), then using the fact that 
J Ti(x)p{x)dx = ^2 and J T2{x)p{x)dx = + "'I: S^t 

RiP\\Q)^-l^(i^l-f,l) + ia2-a^)) + In ^ . (19) 
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Beta. For the beta distribution, the parameters in the exponential family are 
given by 

77=(a-l,/3-lf, r=(ln(x),ln(l-x)f , h{x) = 1, A = In ^^^^^ 
where 

i{a + p) 

Another form of the beta distribution, which is more closely related to the Jacobi 
polynomials, is given by 

fx{k;aj) = i^-xni + x)^ -Kx < 1, a,/3> -1, 

where B{a,(3) = r(a)r(/3)/r(a + /3) is the beta function. Note that the relation 
between the two is simply a rescaling and a variable substitution given by a = 
/3— l,/3 — a— 1. We will use the primary form to determine the relative entropy 
formula. Letting P ~ beta(Q!2, /32) and Q = beta(Q;i, 

Ti{x)p{x)dx = ij{a2) - ip{a2 + /32), J T2{x)p{x)dx = -0(/32) - ■0(Q!2 + P2), 

where ^'(2;) is known as the digamma function, or the zero*'' order of the 
polygamma function. Then, 

RiP\\Q) = (a2-ai)(7/;(a2)-V^(«2 + /32)) 



+ (/32-/3i)(V'(/32)-^(«2+/32)) + In 



S(a2,/32) 



Gamma. Here we have 

rj={-(3,a~l)^, T ^ {x,\n{x))'^ , h{x) = 1, A = logr(a) - alog/3 

If P = gamma(Q;2, /32), Q ~ gamma(ai, where a^, I3i are the so-called shape 
parameters, then 

R{P !IQ) = ^(/?2 - I3i) + (ai - a2)(^(ai)) - log(/3i)) + log ■ 
Binomial. Here 

= (-/?, a _ 1)^, T= (x,ln(a;))^, h{x) = 1, A = logr(a) - alog^. 
If P = binomial(n,p2), Q = binomial(n,pi), then 

R{P\\Q) = log ^^, . Ml = np,. 
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Poisson. In this case 

rj ~ log A, T ~ X, hix) = — , A — \ 

x\ 

If P = Poisson(A2), Q = Poisson(Ai), then the relative entropy distance is 
R{P\\Q) = \i-\2 + Aslog^. 

Note that relative entropy formula is invariant under shifting and scaling 
of both distributions simultaneously. In fact, suppose that ijj and its inverse 
are both well defined and measurable, X and Y have distributions P and Q, 
and that ^/'(X) and ipiX) have distributions P and Q. Then Lemma E.2.1] 
R{P IIQ) = R{P \ \Q)- A list of relative entropy formulas follows. 



Distribution 


R((-)2ll(-)i) 


Gaussian 


^^[{^4-^4) + {<^■2-<r^)\ +logf^ 


Beta/Uniform 


(a2 - cti) VP{ct2) - ^(a2 + h)] 

+(/32 - A) - V'(a2 + /32)] + log 


Gamma 


%{P2 - /3i) + (ai - a2) " log/3i] 
+ log[r(a2)/3r/r(«i)/32"M 


Binomial 


logbf (l-p2)^^-"/p^^(l-pir-"] 


Poisson 


Ai -A2 + A2log^. 



References 

[1] R. K. Boel, M. R. James, and I. R. Petersen. Robustness and risk sensitive 
filtering. IEEE Trans, on Auto. Control, 3:451-461, 2002. 

[2] P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of 
Large Deviations. John Wiley & Sons, New York, 1997. 

[3] P. Dupuis, M. R. James, and I. R. Petersen. Robust properties of risk- 
sensitive control. Math. Control Signals Systems, 13:318-332, 2000. 

[4] S.R.S. Varadhan. Large Deviations and Applications. CBMS-NSF Regional 
Conference Series in Mathematics. SIAM, Philadelphia, 1984. 

[5] D. Xiu. Efficient collocational approach for parametric uncertianty analysis. 
Comm. in Computational Physics, 2:293-309, 2007. 



32 



January 14, 2013 



[6] D. Xiu. Fast numerical methods for stochastic computations. Comm. in 
Computational Physics, 5:242-272, 2009. 

[7] D. Xiu and J. Hesthaven. High-order coUocation methods for differential 
equations with random inputs. SIAM Journal on Scientific Computing, 
27:1118-1139, 2005. 

[8] D. Xiu, J. Jakeman, and M. Eldred. Numerical approach for quantification 
of epistemic uncertainty. Journal of Computational Physics, 229:4648-4663, 
2010. 

[9] D. Xiu and G. Karniadakis. The Weiner-Askey polynomial chaos for stochas- 
tic differential equations. SIAM Journal on Scientific Computing, 24:619- 
644, 2002. 



33 



